Generalized Linear Mixed Models

An Analysis on National Maternal Mortality

Carolyn Herrera & Catherine Funte (Advisor: Dr. Cohen)

2025-08-05

Introduction

Definition

  • Generalized Linear Mixed Models (GLMMs) are a flexible class of statistical models that combine the features of two powerful tools: Generalized Linear Models (GLMs) and Mixed-Effects Models (Agresti 2015)

  • Can model non-normal outcome variables, such as binary, count, or proportion data

  • Incorporate random effects, which account for variation due to grouping or clustering in the data, correlated observations, and overdispersion

When are they useful?

  • Handling hierarchical or grouped data (e.g., students within classrooms, patients within clinics) (Lee and Nelder 1996)

  • Modeling non-normal outcomes, such as:

  • Improving inference by accounting for both fixed effects (predictors of interest) and random effects (random variation across groups)

  • Reducing bias and inflated Type I error rates that can result from ignoring data structure (Thompson et al. 2022)

Past Research Utilizing GLMM

  • Frequently used in fields like medicine, ecology, education, and social sciences

  • One study explores the benefits of a zero-inflated Poisson GLMM (to handle count data has an overabundance of zeroes) applied to maternal mortality data in Ghana (Tawiah, Iddi, and Lotsi 2020)

  • Another study uses GLMM to investigate the effect of particulate matter on child and maternal mortality globally

Application to Social Science

  • We are investigating the potential the dangers of a Post-Roe v. Wade world upon mothers, and the ongoing systematic issues that lay with women of color when giving birth.
  • We will be using that statistical method, Generalized linear mixed model(GLMM), to help analyze the potential risks that lay with mothers that give birth in the United States.
  • With our results we may hope to be better able to identify the relationships between maternal mortality and explanatory variables such as age group, ethnicity, and the Dobbs era of which these deaths took place.
  • These findings are relevant because we could use them as evidence to help reduce maternal mortality amongst certain groups and potentially investigate if a postt-dobbs era could have resulted in a higher maternal mortality rate.

Methods

Math Background

  • GLMMs can be considered an extension of GLMs, wherein a GLM includes the addition of random effects, or an extension of Linear Mixed Models (LMMs), where a linear model with fixed and random effects is extended for non-normal distributions (Salinas Ruı́z et al. 2023).

Let

  • \(\mathbf{y}\) be a \(Nx1\) column vector outcome variable

  • \(\mathbf{X}\) be a \(Nxp\) matrix for the \(p\) predictor variables

  • \(\boldsymbol{\beta}\) be a \(px1\) column vector of the fixed effects coefficients

  • \(\mathbf{Z}\) is a \(Nxq\) matrix of the \(q\) random effects

  • \(\mathbf{u}\) is a \(qx1\) vector of random effects, and

  • \(\boldsymbol{\epsilon}\) is a \(Nx1\) column vector of the residuals

Then the general equation for the model is given by:

\[\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}{u}+\boldsymbol{\epsilon}\]

The GLMM Model process is that the analyis of variance model or the equation is a vector of linear predictors with of unknown parameters estimates. Each distribution has is its own probability function which we will utilize the Negative Binomial as GLMMs typically include a link function that relates the response variable \(\mathbf{y}\) to a linear predictor, \(\eta\), which excludes the residuals. So then \[\boldsymbol{\eta}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}\boldsymbol{\lambda}\]

The link function is \(g(\cdot)\), where \[g(E(\mathbf{y}))=\boldsymbol{\eta}\] where \(E(\mathbf{y})\) is the expectation of \(\mathbf{y}\). The choice of link function depends on the outcome distribution. For this paper our data demonstrates a Negative Binomial distribution for overdispered count data, so we will use a log link function.
\[g(\cdot)=log_e(\cdot)\]

Negative Binomial Distribution

\[ f(y;k,{\mu})=\frac{\Gamma(y+k)}{\Gamma(k)*(y+1)}\left(\frac{k}{\mu+k}\right)^{k}\left(1-\frac{k}{\mu+k}\right)^{y} \] The Mean of Negative Binomial is given: \(E(Y)= {\mu}\) The Variance of Negative binomial is given; \(Var(Y)= {\mu}+ \left(\frac{\mu^2}{k}\right)\), where second term determines the overdispersion, \(k\) is called the dispersion parameter and indirectly determines overdispersion. If \(k\) is significantly large relative to \({\mu^2}\) then the second term will approximate to zero and a Poisson distribution may as well be used. However, the smaller the \(k\) value the larger the overdispersion may form and then negative binomial is the correct log link to utilize.

Assumptions

  • The response variable and the predictors have a linear relationship within the levels of random effects.

  • The response variable is assumed to follow a negative binomial distribution, with \(\sigma^2>\mu\).

  • The residuals and random effects are independent.

  • The random effects are assumed to be normally distributed, with mean 0 and variance \(\sigma\).

Model Choice - Negative Binomial GLMM

  • Negative Binomial ideal for count data that is overdispersed (which we suspect as it is population data)

  • Longitudinal data is not independent so a GLMM is necessary so we can include time as a random effect

  • Accounts for variation in the model that would not be explained by our fixed effects

  • Analysis performed with R (R Core Team 2025)

Data Exploration and Visualization

About the Dataset

  • Vital Statistics Rapid Release (VSRR) Provisional Maternal Death Counts and Rates, in the form of a .csv

  • Published by National Vital Statistics System, a collaboration between the National Center for Health Statistics (NCHS) and state vital record offices

  • Monthly death counts and death rates by race/ethnicity, age, and overall

  • Data from January 2019 to December 2024

  • Data is provisional and updated quarterly; becomes more reliable with more updates

  • Maternal Deaths between 1 and 9 are suppressed for privacy reasons

Missing Data

  • “Native Hawaiian or Other Pacific Islander, Non-Hispanic” has 70 NAs for Maternal Mortality, omitting this subgroup entirely

  • “American Indian or Alaska Native, Non-Hispanic” has 58 NAs for Maternal Mortality Rate, not using rate in our model, omitting will not affect modeling

Exploratory Data Analysis

Exploratory Data Analysis

Exploratory Data Analysis

Exploratory Data Analysis

Exploratory Data Analysis

Modeling and Results

Data Preprocessing

  • Renamed column names
  • Filtered out maternal deaths less than zero and live births less than 100
  • Removed maternal death NA values
  • Variables ethnicity and age group mutated from variables Group and Subgroup
  • Created a binary variable called Dobbs_Era with values “Pre-Dobbs” and “Post-Dobbs” based on the overturning of Roe V. Wade with a value cut-off of June 4, 2022

Modeling

  • Created models with different fixed variables of Ethnicity, Age group, Dobbs Era, and the random effect variable year, and the log of Live Births as offset
  • We first tested a Poisson model and found significant evidence of overdispersion (as expected) and hence used a GLMM with Negative Binomial family

Models

Name Fixed_Effects Random_Effects Offset
all_glmmodel_nb Ethnicity, Age_Group, Dobbs_Era Year log(Live_Births)
ethnicity_agegroup_glmmodel_nb Ethnicity, Age_Group Year log(Live_Births)
allno_glmmodel_nb Ethnicity, Age_Group, Dobbs_Era Year None
ethnicity_agegroupno_glmmodel_nb Ethnicity, Age_Group Year None

Models

 Family: nbinom2  ( log )
Formula:          
Maternal_Deaths ~ Ethnicity + Age_Group + Dobbs_Era + (1 | Year)
Data: deaths_df3
 Offset: log(Live_Births)

      AIC       BIC    logLik -2*log(L)  df.resid 
   4567.5    4614.2   -2272.7    4545.5       507 

Random effects:

Conditional model:
 Groups Name        Variance Std.Dev.
 Year   (Intercept) 0.03158  0.1777  
Number of obs: 518, groups:  Year, 6

Dispersion parameter for nbinom2 family ():  148 

Conditional model:
                                                        Estimate Std. Error
(Intercept)                                             -8.79687    0.07678
EthnicityBlack, Non-Hispanic                             1.31563    0.02627
EthnicityWhite, Non-Hispanic                             0.28329    0.02604
EthnicityHispanic                                        0.16204    0.02704
EthnicityAmerican Indian or Alaska Native, Non-Hispanic  1.65241    0.06285
EthnicityUnknown                                         0.04364    0.02750
Age_Group25-39 years                                     0.38817    0.01821
Age_Group40 years and over                               1.81031    0.02053
Age_GroupUnknown                                              NA         NA
Dobbs_EraPost-Dobbs                                     -0.22081    0.02303
                                                        z value Pr(>|z|)    
(Intercept)                                             -114.57  < 2e-16 ***
EthnicityBlack, Non-Hispanic                              50.08  < 2e-16 ***
EthnicityWhite, Non-Hispanic                              10.88  < 2e-16 ***
EthnicityHispanic                                          5.99 2.06e-09 ***
EthnicityAmerican Indian or Alaska Native, Non-Hispanic   26.29  < 2e-16 ***
EthnicityUnknown                                           1.59    0.113    
Age_Group25-39 years                                      21.31  < 2e-16 ***
Age_Group40 years and over                                88.17  < 2e-16 ***
Age_GroupUnknown                                             NA       NA    
Dobbs_EraPost-Dobbs                                       -9.59  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model Choice

Our Chosen model in Regression equation format:

\[ \begin{align*} \log(\mathbb{E}[\text{Maternal Deaths}_i]) &= \beta_0 + \beta_1 \cdot \text{Black}_i \\ &+ \beta_2 \cdot \text{White}_i + \beta_3 \cdot \text{Hispanic}_i \\ &+ \beta_4 \cdot \text{American Indian or Alaska Native}_i \\ &+ \beta_5 \cdot \text{EthnicityUnknown}_i \\ &+ \beta_6 \cdot \text{Age 25-39}_i + \beta_7 \cdot \text{Age 40 Plus}_i \\ &+ \beta_8 \cdot \text{Post Dobbs}_i + b_{\text{Year}[i]} + \log(\text{Live Births}_i) \end{align*} \]

Assumption Checking

  • Our chosen model adheres by all assumptions of GLMM
  • The response variable and the predictors have a linear relationship within the levels of random effects.

Assumption Checking

  • The response variable is assumed to follow a negative binomial distribution, with \(\sigma^2>\mu\).
  mean(Maternal_Deaths) var(Maternal_Deaths)
1              828.8889             30819.62

Assumption Checking

  • The residuals and random effects are independent.

Assumptions Checking

  • The random effects are assumed to be normally distributed, with mean 0 and variance \(\sigma\).
  • The random effect of year is approximately normal with some mild deviations which is common with count models.

Model Characteristics

GLMM negative binomial model with offset (Log Estimates)
  Maternal Deaths
Predictors Log-Mean std. Error CI Statistic p
(Intercept) -8.80 0.08 -8.95 – -8.65 -114.57 <0.001
Ethnicity [Black,
Non-Hispanic]
1.32 0.03 1.26 – 1.37 50.08 <0.001
Ethnicity [White,
Non-Hispanic]
0.28 0.03 0.23 – 0.33 10.88 <0.001
Ethnicity [Hispanic] 0.16 0.03 0.11 – 0.22 5.99 <0.001
Ethnicity [American
Indian or Alaska Native,
Non-Hispanic]
1.65 0.06 1.53 – 1.78 26.29 <0.001
Ethnicity [Unknown] 0.04 0.03 -0.01 – 0.10 1.59 0.113
Age_Group25-39 years 0.39 0.02 0.35 – 0.42 21.31 <0.001
Age Group [40 years and
over]
1.81 0.02 1.77 – 1.85 88.17 <0.001
Dobbs Era [Post-Dobbs] -0.22 0.02 -0.27 – -0.18 -9.59 <0.001
Random Effects
σ2 8.00
τ00 Year 0.03
ICC 0.00
N Year 6
Observations 518
Marginal R2 / Conditional R2 0.056 / 0.059

Model Characteristics

GLMM negative binomial model with offset
  Maternal Deaths
Predictors Incidence Rate Ratios CI p
(Intercept) 0.00 0.00 – 0.00 <0.001
Ethnicity [Black,
Non-Hispanic]
3.73 3.54 – 3.92 <0.001
Ethnicity [White,
Non-Hispanic]
1.33 1.26 – 1.40 <0.001
Ethnicity [Hispanic] 1.18 1.12 – 1.24 <0.001
Ethnicity [American
Indian or Alaska Native,
Non-Hispanic]
5.22 4.61 – 5.90 <0.001
Ethnicity [Unknown] 1.04 0.99 – 1.10 0.113
Age_Group25-39 years 1.47 1.42 – 1.53 <0.001
Age Group [40 years and
over]
6.11 5.87 – 6.36 <0.001
Dobbs Era [Post-Dobbs] 0.80 0.77 – 0.84 <0.001
Random Effects
σ2 8.00
τ00 Year 0.03
ICC 0.00
N Year 6
Observations 518
Marginal R2 / Conditional R2 0.056 / 0.059

Model Characteristics (without offset model)

Fixed Effects from GLMM without offset(Incidence Rate Ratios)
  Maternal Deaths
Predictors Incidence Rate Ratios CI p
(Intercept) 33.65 29.12 – 38.88 <0.001
Ethnicity [Black,
Non-Hispanic]
8.67 8.24 – 9.13 <0.001
Ethnicity [White,
Non-Hispanic]
11.09 10.54 – 11.67 <0.001
Ethnicity [Hispanic] 4.81 4.56 – 5.07 <0.001
Ethnicity [American
Indian or Alaska Native,
Non-Hispanic]
0.62 0.54 – 0.70 <0.001
Ethnicity [Unknown] 3.80 3.60 – 4.01 <0.001
Age_Group25-39 years 4.95 4.78 – 5.12 <0.001
Age Group [40 years and
over]
1.04 1.00 – 1.08 0.058
Dobbs Era [Post-Dobbs] 0.80 0.77 – 0.84 <0.001
Random Effects
σ2 0.01
τ00 Year 0.03
ICC 0.73
N Year 6
Observations 518
Marginal R2 / Conditional R2 0.957 / 0.988

Model Outcomes

  • Black women had a 267% higher rate of maternal mortality compared to reference group (Asian), holding all other groups constant
  • White women have a 31% higher rate of maternal mortality compared to reference group (Asian) holding all other groups constant
  • Hispanic women have a 15% higher rate of maternal mortality compared to reference group (Asian), holding all other groups constant
  • Unknown Ethnicity was not statistically significant
  • Women 25-39 have a 37% higher rate of maternal mortality compared to reference group(Under 25) holding all other groups constant
  • Women 40 years and older have a 505% higher rate of maternal mortality compared to reference group (under 25), holding all other groups constant

Model Outcomes

  • There was a 20% lower rate in maternal mortality after Dobbs was passed compared to prior the passing of Dobbs
  • The residual variance is 7.99, variance in maternal deaths not explained by fixed effects or random effects 5.5% of proportion of variance explained by fixed effects only, they only explain 5.5% of variance in maternal deaths conditional 5.9% is the proportion of variance contributed by fixed and random effects, random effect year contributes little to improving model
  • Our low variability could be due to not having a random effect like state or other fixed effects to explain variable

Model Outcomes - With and Without Offsets

Model Outcomes

Limitations

  • The data used is provisional and updates quarterly with both new and old counts, so further analysis may offer differing results

  • The data offered counts by age group and ethnicity, but not both (i.e. maternal deaths for black women 40 and over). The inclusion of such data would give a better indication of the relationship between the two subgroups.

  • Due to the Covid-19 pandemic’s impact on the healthcare system, access to regular healthcare was restricted. This likely had an impact on maternal mortality and may partially account for increased rates from 2020-2023.

  • More variables in the dataset would offer a better picture of the predictors of maternal mortality, specifically in regards to their relationship with ethnicity. Prenatal health, healthcare access, abortion access, and other prenatal behaviors would be useful.

Conclusions and Further Study

  • American Indian or Alaska Natives have the highest maternal mortality rate, despite having the smallest number of maternal deaths
  • Black women had the second highest maternal mortality rate, and the second highest total maternal deaths
  • Based on the provisional data there is no evidence that maternal mortality rate increased post-Dobbs
  • A study utilizing state specific data (due to differing laws regarding abortion) might give a clearer indication of the impact of Dobbs on maternal mortality and whether abortion access is a factor
  • If this GLMM could be improved upon with new data then racial diparities in maternal mortality could be reduced by identifying significant explanatory variables
  • A study investigating healthcare access for expecting mothers specific to the Covid-19 pandemic would give insight into the spike in maternal mortality rate in 2022

References

Agresti, A. 2015. Foundations of Linear and Generalized Linear Models. Wiley Series in Probability and Statistics. Wiley. https://books.google.com/books?id=jlIqBgAAQBAJ.
Candy, Steven G. 2000. “The Application of Generalized Linear Mixed Models to Multi-Level Sampling for Insect Population Monitoring.” Environmental and Ecological Statistics 7 (3): 217–38.
Lee, Youngjo, and John A Nelder. 1996. “Hierarchical Generalized Linear Models.” Journal of the Royal Statistical Society Series B: Statistical Methodology 58 (4): 619–56.
R Core Team. 2025. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Salinas Ruı́z, Josafhat, Osval Antonio Montesinos López, Gabriela Hernández Ramı́rez, and Jose Crossa Hiriart. 2023. Generalized Linear Mixed Models with Applications in Agriculture and Biology. Springer Nature.
Tawiah, Kassim, Samuel Iddi, and Anani Lotsi. 2020. “On Zero-Inflated Hierarchical Poisson Models with Application to Maternal Mortality Data.” International Journal of Mathematics and Mathematical Sciences 2020 (1): 1407320.
Thompson, Jennifer A, Clemence Leyrat, Katherine L Fielding, and Richard J Hayes. 2022. “Cluster Randomised Trials with a Binary Outcome and a Small Number of Clusters: Comparison of Individual and Cluster Level Analysis Method.” BMC Medical Research Methodology 22 (1): 222.
Wang, Ke-Sheng, Xuefeng Liu, Muyiwa Ategbole, Xin Xie, Ying Liu, Chun Xu, Changchun Xie, and Zhanxin Sha. 2017. “Generalized Linear Mixed Model Analysis of Urban-Rural Differences in Social and Behavioral Factors for Colorectal Cancer Screening.” Asian Pacific Journal of Cancer Prevention: APJCP 18 (9): 2581.